Unsupervised speaker normalization using canonical correlation analysis

نویسندگان

Yasuo Ariki

Miharu Sakuragi

چکیده

Conventional speaker-independent HMMs ignore the speaker di erences and collect speech data in an observation space. This causes a problem that the output probability distribution of the HMMs becomes vague so that it deteriorates the recognition accuracy. To solve this problem, we construct the speaker subspace for an individual speaker and correlate them by o-space canonical correlation analysis between the standard speaker and input speaker. In order to remove the constraint that input speakers have to speak the same sentences as the standard speaker in the supervised normalization, we propose in this paper an unsupervised speaker normalization method which automatically segments the speech data into phoneme data by Viterbi decoding algorithm and then associates the mean feature vectors of phoneme data by o-space canonical correlation analysis. We show the phoneme recognition rate by this unsupervised method is equivalent with that of the supervised normalization method we already proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition

Probabilistic Linear Discriminant Analysis (PLDA) continues to be the most effective approach for speaker recognition in the i-vector space. This paper extends the PLDA model to include both enrollment and test cut duration as well as to distinguish between session and channel variability. In addition, we address the task of unsupervised adaptation to unknown new domains in two ways: speaker-de...

متن کامل

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTL...

متن کامل

Improved Speaker Markov Modelling for Unsupervised Speaker Normalization

We propose new methods of improved speech recognition with speaker-variable Information. Hidden Markov Model-based recognizers which are trained by reference speaker(s) (RS) are normalized by our two different approaches to give a better speaker-independent recognition rate. Our normalization methods are based on the same principle of inter-speaker Markov mapping. This mapping gives inter-speak...

متن کامل

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysisbased Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy m...

متن کامل

Eliminating Inter-speaker Variability Prior to Discriminant Transforms

This paper shows the impact of speaker normalization techniques such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT) prior to discriminant feature space transforms, such as LDA. We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Unsupervised speaker normalization using canonical correlation analysis

نویسندگان

چکیده

منابع مشابه

Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Improved Speaker Markov Modelling for Unsupervised Speaker Normalization

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

Eliminating Inter-speaker Variability Prior to Discriminant Transforms

عنوان ژورنال:

اشتراک گذاری